METHOD AND APPARATUS FOR REDUCING RATE DETERMINATION 
ERRORS AND THEIR ARTIFACTS 

5 Field of the Invention 

The present invention relates generally to communication systems, and more 
particularly, the present invention relates to a method and apparatus for reducing rate 
determination errors in a communication system, as well as mitigating the audio 
1 0 artifacts resulting from any remaining rate determination errors. 

Background of the Invention 

15 Within a Code Division Multiple Access (CDMA), and other communication 

system types, communicated information, either voice or data, is carried between 
communication resources, e.g., a radio telephone and a base station, on a 
communication channel. Within broadband, spread spectrum communication 
systems, such as CDMA based communication systems in accordance with Interim 

2 0 Standard IS-95B, a spreading code is used to define the communication channel. 

CDMA systems have the capability of transmitting user information at 
variable rates. For example in voice calls the data rate of each speech frame is varied 
based on the speech activity. When a user is speaking, compressed speech information 
is typically sent at full rate. Between words and sentences the data rate is typically 

2 5 reduced to eighth rate. Half and quarter rates are also used for speech to quiet 

transitions and when data rate reductions are required, such as to allow for 
multiplexing of signaling information or to increase system capacity. In data services 
calls, full, half, quarter and eighth rate frames can be selected based on the data rate of 
the user requested information. 

3 0 To protect against data corruption on the air interface, mobile communication 

systems typically employ Forward Error Correction techniques. In the base site to 
mobile subscriber unit direction, deemed the forward link, IS-95 includes the addition 
of Cyclic Redundancy Check (CRC) bits, convolutional encoding, data repetition and 
interleaving. Data repetition is used on subrate frames (half, quarter and eighth rate) 
3 5 after convolutional encoding resulting in a constant data rate on the air interface. 

In CDMA communication systems the receiver does not know apriori the data 
rate of a received frame. The receiver has to apply the decoding mechanism for each 
of the allowable frame rates, and look at certain characteristics of the received data 
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frames to determine the probable frame rate that the frame was transmitted at. 
Characteristics that are usually employed are Symbol Error Rate (SER), CRC 
verification and Viterbi decoder Quality bits. SER is an estimate of the number of 
symbol errors in the convolutionally coded data that is obtained by re^encoding the 
5 information sequence recovered by convolutional decoding and accumulating the 
number of re-encoded channel symbols found to be different from the received 
symbols. Some of the frame rates, namely full and half rate for IS-95, are protected by 
a CRC codeword. These are generated by the transmitter by performing a type of 
degenerate cyclic coding on the data. The resulting CRC is convolutionally encoded 

10 and transmitted with the data. The receiver also generates the CRC of the received 
convolutionally decoded data, and compares it with the CRC appended by the 
transmitter. Viterbi decoders are typically used for convolutional decoding. In 
addition to the decoded data sequence they sometimes provide a Quality bit indication 
that indicate whether a decoded sequence deviated excessively from a valid data 

1 5 sequence. 

The decision as to what rate was employed by the transmitter is typically 
performed by the receiver's rate determiner utilizing a Rate Determination Algorithm 
(RDA). The determiner uses the decoding characteristics from each of the decoders to 
determine what rate the received frame was transmitted at and/or whether the frame is 
2 0 useable. If the frame contains too many bit errors or its rate cannot be determined the 
frame is declared an erasure. A RDA will typically have a series of rules that it 
follows to determine the rate. For example some such rules could be 
IF CRCf u n = TRUE AND SER fu „ <= SER fu i lthresho i d 
THEN FRAMERATE = FULL 

2 5 IF CRCfun = FALSE AND SER fun > SER fu ii t hreshoid 

AND CRC ha , f = FALSE AND SER ha i f > SER ha if t hreshoid 
AND SERe lg hth < SERe ighththreshold 
THEN FRAME RATE = EIGHTH 
Although RDAs typically do a good job of distinguishing between frame rates 

3 0 they are still subject to falsing. For example, a frame that was transmitted as an eighth 

rate frame can be incorrectly interpreted by the receiver as a full rate frame. The 
effects of these mis-determined rates can be severe, sometimes resulting in severe 
audio artifacts in voice calls and a reduction in data throughput for data calls. The 
falsing rate has been found to be dependant on many variable factors including the 
3 5 content of the frame being transmitted, interference conditions on the air interface and 
the performance of the receivers determiner. The FEC protocols used in IS-95 and 
known in the art have also been found to be non-optimal in providing adequate code 
distance between a transmitted subrate frame and the nearest possible full rate frame. 
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For example, when presented with silence, the Enhanced Variable Rate Codec 
(EVRC) used in CDMA systems has been observed to converge on the 16 bit eighth 
rate frame 0740H, and repeat this frame over and over. Simulations of the IS-95 FEC 
scheme shows that this eighth rate when passed through the eighth rate convolutional 
5 encoder and data repeator, could be decoded by a full rate decoder with a very low 
SER. When the encoded frame is punctured by power control bits and suffers a few 
bit errors on the air interface it has been observed that the CRC can also pass. As 
shown by the determiner rules above, these conditions of a CRC pass and low SER 
are typically sufficient for the received frame to be declared a good full rate frame. 

1 o The severity of the resulting audio effects depend primarily on the contents of 

the received false full rate frame and whether they correspond to high audio gains, 
high frequencies etc after speech decoding. However, error mitigation techniques that 
are used to reduce the audio effects of air interface erasures have been found to also 
negatively affect the audio artifact, 
ijjj 15 Thus, there is a need for a method and apparatus for reducing rate 

determination errors and their audio effects in a communication system. 

i i'S 

111 

i Brief Description of the Drawings 

I 2 0 

rf FIG. 1 is a block diagram of a wireless communication system. 

FIG. 2 is a block diagram of the error correction functions within a wireless 
unit in accordance with the preferred embodiment of the present invention. 

FIG. 3 is a diagram of a variable rate data stream in accordance with the 

2 5 preferred embodiment of the present invention. 
FIG. 4 is a flow diagram of the operation of a rate determination and error 

mitigation algorithm in accordance with the preferred embodiment of the present 
invention. 

FIG. 5 is block diagram of a speech decoder reset mechanism in accordance 

3 0 with the preferred embodiment of the present invention. 
FIG. 6 is a diagram illustrating the audio artifacts incurred after a mis- 
determination with and without the preferred embodiment of the present invention. 



3 5 Detailed Description of the Preferred Embodiments 

The present invention provides a method and apparatus for improving the 
quality of an audio signal on a communication system. The method includes 
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determining the validity of the frame rate of a speech frame and modifying the state of 
at least one speech decoder filter based on the validity determination. Applicable 
speech decoder filters include, but are not limited to, the pitch filter, the vocal tract 
filter and the post filter. The validity determination can be based on comparing the 
5 frame rate of the current frame with that of previously received frames. In particular if 
an eighth rate frame is received after a full rate frame that did not contain signaling 
information the frame is deemed to be invalid. The invention also allows for 
adjustment of symbol error thresholds based on the number of consecutive frames of 
the same frame rate. Adjusting these thresholds reduces the number of rate 
1 0 determination errors and hence improving the audio quality of the resulting speech. 

The present invention provides an apparatus that includes means for 
determining the validity of a frame rate and a speech decoder capable of modifying, 
including reseting, its' filter states based on the validity determination. The present 
invention also provides means for adjusting symbol error thresholds based on the 
.jl 1 5 number of consecutive frames with the same frame rate. 
21 FIG. 1 generally depicts a communication system in accordance with the 

\j preferred embodiment of the present invention. As shown in FIG. 1, a Base Site 

jJ] Controller (BSC) 10 is in communication with a Mobile Switching Center (MSC) 12 

j-jj which is in turn in communication with the PSTN 8. In the preferred embodiment, the 

2 0 communication system is a Code Division Multiple Access (CDMA) cellular 
radiotelephone system, however it will be recognized by those of ordinary skill in the 
111 art that any suitable communication system may utilize the invention. 

\i{ BSC 10 includes a speech encoder 20, a processor 22 and a multiplexer 

CI (MUX) 24. The speech encoder 20 receives speech samples at a data rate of 

2 5 64kbits/sec from the MSC 12 and uses speech compression algorithms such as 

Enhanced Variable Rate Codec (EVRC), that are well known in the art, to reduce the 
data rate. Speech Encoder 20 includes a rate selector 26, that selects the appropriate 
data rate for each 20mS portion of the received speech to be encoded at. The data rate 
of the resulting compressed speech frame is typically dependant on the level of speech 

3 0 activity within the sampled speech. In the case of EVRC there are three valid frame 

rates; full, half and eighth rate. Typically full rate frames are produced when active 
speech is occurring and eighth rate frames are produced during quiet periods. Half 
rate frames are typically produced during speech to quiet transitions or if commanded 
to by the MUX 24. For EVRC a full rate speech frame followed by an eighth rate 
3 5 speech frame is not allowed, hence all speech to quiet transitions include a half rate 
speech frame. 

Processor 22 is responsible for generating and terminating signaling messages 
with the mobile unit 70. These signaling messages are multiplexed with the encoded 
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speech frames from speech encoder 20 and with some additional control information 
by the MUX 24 to form full, half or eighthrate traffic frames. The additional control 
information includes a parameter specifying the traffic frame rate. The traffic frames 
are then sent via communication link 28 to the Base Transmitter Site (BTS) 30. 
5 The traffic frames are received by the packet terminator 32, which generates a 

control signal 34 indicative of the traffic frame rate. A switch 36 controlled by the 
control signal 34 determines whether a full rate CRC 38, a half rate CRC 40 or no 
CRC 41 is appended to the traffic frame. The traffic frames are then passed through a 
l A rate convolutional encoder 42 before being presented to the data repeater 44. The 

1 0 data repeater takes subrate frames, such as half and eighth rate frames, and upsamples 
them so that all frames contain the same number of bits. In the case of eighth rate 
frames every received bit is repeated seven times. Similarly every bit is repeated once 
for half rate frames. After the data repeater 42 every frame contains 384 bits. 

The frames are then passed through a data interleaver 46 which scrambles the 

1 5 data in a predetermined order. This improves the resilience of the frame to burst errors 
on the air interface 60. 32 bits, in predetermined positions, within the frame are then 
replaced by power control information bits. This process is performed by the power 
control puncturing function 48. The resulting frame is passed to the power amplifier 
50 for transmission over the air interface 60. The transmission power used for the 

2 0 frame is partly dependent on the control signal 34. The frame is then received, 
probably with bit errors, by the mobile unit 70. 

FIG. 2 depicts the error correction functions within the mobile unit 70 of FIG. 
1. The deinterleaver 102 receives 384 symbols from the RF front end 100. Each 
symbol is a confidence level of whether the corresponding transmitted bit was a 0 or a 

2 5 1. These confidence levels are deemed soft decision values. For example in a 4 bit 

soft decision system a 0000 could represent very high probability that a transmitted 
bit was a 0 and 1111 cduld represent a very high probability that the bit was a 1 . 1001 
would suggest that the transmitted bit was a 1, but the confidence of the RF front end 
100 is low. The deinterleaver 102 descrambles the symbols and presents the frame to 

3 0 multiple decode paths. A decode path exists for each possible traffic frame rate that 

the received frame could have been originally sent at by the MUX 24 of FIG. 1 . The 
multiple decode paths are necessary because the receiver does not know apriori the 
traffic frame rate. In the case of EVRC there are three possible frame rates, full, half 
and eighth rate. 

3 5 The eighth rate decode path consists of an l/8 th rate combiner 104 and a 

convolutional decoder 106. The eighth rate combiner 104 combines each group of 8 
consecutive symbols into one symbol to compensate for the data repetition introduced 
by the data repeater 44 of FIG. 1. The convolutional decoder 106, which is used to 
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correct errors in the frame, outputs 16 data bits and an estimate of the Symbol Error 
Rate SER^ighth. The half rate decode path consists of a half rate combiner 110, a 
convolutional decoder 112 and a CRC check 114. The convolutional decoder 112 
outputs 80 data bits, SER ha if and the received CRC. The CRC is checked by the CRC 
5 check 114 and the result CRChaif is passed to the determiner's rate determination 
algorithm (RDA). The full rate decode path consists of a convolutional decoder 120 
and a CRC check 122. The convolutional decoder 120 outputs 172 data bits, SER fu n 
and the received CRC. The CRC is checked by the CRC check 122 and the result 
CRCfuii is passed to the determiner 150. The determiner 150 determines the rate of the 
10 transmitted frame and selects the appropriate decoded frame for transmission to a 
speech decoder 155. The speech decoder 155 is responsible for decompressing the 
received speech frame using speech algorithms known in the art. The decompression 
algorithm is dependent on the frame rate. 

The SER and CRC parameters as well as their use in determining the rate of a 

Q'i 

;.f l 15 frame are well known in the art. However, as previously mentioned, the determiner 
"I 150 is prone to falsing and can sometimes mis-determine the rate of a frame. In 

%\ accordance with the preferred embodiment of the invention the determiner 150 

includes additional logic for reducing the mis-determinations and also for reducing 
the audio effects when mis-determinations occur. In accordance with the preferred 
2 0 embodiment of the present invention a control signal 160 from the determiner 150 to 
the speech decoder 155 is provided. The control signal 160 commands the speech 
flj decoder 155 to reset its internal digital filters when the determiner 150 believes that 

y J the previously received frame was mis-determined. 

rr For EVRC, as well as other variable rate vocoders known in the art, a direct 

2 5 transition from full rate to eighth rate is not allowed. The standards require that at 

least one half rate frame must be transmitted between any transition from full rate to 
eighth rate. Figure 3 shows an example of a typical transition from full rate to eighth 
rate as well as a transition induced by a frame rate misdetermination. A series of full 
rate frames 200-206, corresponding to speech activity, were transmitted by the BTS 

3 0 30 and correctly received by the determiner 150. During the transition to quiet a half 

rate frame 208 was generated by the speech encoder 20, to satisfy the rate transition 
rules imposed by the vocoder algorithm, and correctly received by the determiner 150. 
Following the half rate frame 208, a series of eighth rate frames 210-220 is correctly 
received. Frame 222 was originally generated by the speech encoder 20 as an eighth 
3 5 rate frame but has been mis-determined by the determiner 150 as a full rate frame. 
When a frame rate is misdetermined by the determiner 150, the speech decoder 152 
will be presented with a single full rate frame 222 after a series of eighth rate frames 
210-220, followed by a second series of eighth rate frames 226-232. The speech 
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decoder 152, however, requires that a half rate frame 224 is received between any full 
rate to eighth rate transition. As a result, the speech decoder 152 will declare the 
following valid eighth rate frame 226 as an erasure, as known in the art. In an 
alternative embodiment the determiner 150 may recognise the rate step down 
5 violation and declare the frame an erasure. The erasure forced by the vocoder 
algorithm has the effect of prolonging any audio anomoly produced from the original 
misdetermination since vocoder erasure processing as known in the art, involves 
utilizing parametric information from the frame received prior to the erasure frame. In 
the case of a misdetermination, the reused parameters originate from the corrupt 
10 misdetermined frame and thus the effect of the bad frame is extended. 

An improved determiner 150 is introduced which is composed of two parts. 
The first part consists of adjusting the SER thresholds used by the determiner 150 
based on the frame rate history. After a period of T 8 consecutive eighth rate frames, 
the SER threshold for full rate frames could be lowered from SERfti to SER F T2 
,|| 15 requiring that subsequent full rate frames would have to be received with higher 
frame quality as measured by the SERf U n received from the full rate convolutional 
S\ decoder 120. Additionally, the eighth rate SER threshold could be raised from SER ET i 

^1 to SER E T2 requiring that subsequent eighth rate frames could be received with lower 

W jl frame quality as measured by the SER E received from the eighth rate convolutional 

* 2 0 decoder 106. The second part of the improved determiner 150 introduces a control 

path to the speech decoder 152 to allow for filter state cleanup within the vocoder 
|1j algorithm. This is beneficial for minimizing the audio impact of any 

'it 

misdeterminations that persist. 

FIG. 4 is a flow diagram that shows more details of the operation of the 

2 5 improved determiner 150. We start at step 300 where the full rate CRC, received from 

full rate CRC check 122, is tested for a pass/fail condition. If the CRCfuii is 
determined to have failed the validity test, then the frame is removed from being a 
possible full rate frame candidate and the logic flow proceeds to step 316 to check for 
the validity of other frame rates. If the CRCf U ii is determined to have passed the 

3 0 validity test, then the logic flow proceeds to step 302 where the SER fu n, received from 

the full rate convolutional decoder 120, is evaluated. If the SERf u n exceeds the 
nominal threshold SERfti, then the frame is removed from being a possible full rate 
frame candidate and the logic flow proceeds to step 3 1 6 to check for the validity of 
other frame rates. If the SERf u n is less than or equal to the nominal threshold SERfti, 
3 5 then the logic flow proceeds to step 304 where the frame is evaluated to determine if 
it contains signaling traffic. This is necessary to prevent frames that contain critical 
call processing information in the form of signaling traffic to be subjected to the 
stricter SER FT 2 threshold test in step 308. For the IS-95B CDMA standard, this 



-7- 



information is contained in the first few bits of the convolutionally decoded frame in 
the form of a mixed-mode bit (MM bit), a traffic type bit (TT bit), and a pair of traffic 
mode bits (TM bits). The definitions and usage of these bits is well known in the art. 

Returning to step 304, if the frame is determined to contain signaling 
5 information, then the frame is considered as a valid full rate frame and the logic flow 
proceeds to step 312. If it is determined that the frame does not contain signaling 
information, then the logic flow proceeds to step 306 where the consecutive eighth 
rate frame counter C 8 is compared to the threshold T 8 . If C 8 is greater the threshold T 8 , 
then the stricter secondary SER threshold SER F T2 is not checked and the logic flow 

10 proceeds to step 310 where the frame is declared to be a valid full rate frame. If Cg is 
less than or equal to the threshold T 8 , then the logic flow proceeds to step 308 where 
SERfuii, received from the full rate convolutional decoder 120, is compared to the 
stricter secondary threshold SER FT 2. This secondary threshold is used to make it more 
difficult, in terms of allowed number of symbol errors, for a non-signaling full rate 

1 5 frame to be declared as valid. This requires that the first full rate frame or series of 
full rate frames following a interval of non-full rate frames have lower symbol error 
rate than is normally required. 

If in step 308 SER fu ii exceeds the threshold SER FT 2, then the frame is removed 
from consideration as a full rate frame and the logic flow proceeds to step 316 where 

2 0 other frame rates will be checked. If the SER fu n is less than or equal to SER FT 2, then 
the logic flow proceeds to step 310 where the consecutive eighth rate frame counter 
C 8 is reset to zero and the consecutive full rate counter is incremented. The logic flow 
continues to step 312 where the frame rate is set to be full rate. 

If the frame could not be validated as a full rate frame, the logic flow will 

2 5 follow one of the paths to step 316 where the frame's half rate validity is considered. 

In step 316, the half rate CRC, received from half rate CRC check 114, is tested for a 
pass/fail condition. If the CRChaif is determined to have failed the validity test, then 
the frame is removed from being a possible half rate frame candidate and the logic 
flow proceeds to step 324 to check for the validity of other frame rates. If the CRChaif 

3 0 is determined to have passed the validity test, then the logic flow proceeds to step 318 

where the SERhaif, received from the full rate convolutional decoder 120, is evaluated. 
If SERhaif is less than or equal to the threshold SER H t, then the logic flow proceeds to 
step 330 where the consecutive eighth rate frame and the consecutive full rate frame 
counters are reset to zero. The logic flow then proceeds to step 322 where the frame 
3 5 rate is set to be half rate. If in step 318, SER ha if exceeds the threshold SERht, then the 
frame is removed from consideration as a half rate frame and the logic flow proceeds 
to step 324 where other frame rates will be checked. 
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If the frame could not be validated as a full rate or half rate frame, then the 
logic flow will follow one of the paths leading to step 324. In step 324, SEReighth, 
received from the eighth rate convolutional decoder, is evaluated. If SERei ght h is less 
than or equal to the normal threshold SER ET i, then the logic flow proceeds to step 
5 334. If SER^ighth exceeds the normal threshold SEReti, then the logic flow proceeds to 
step 326 where the consecutive eighth rate frame counter Cg is compared to the 
threshold value T 8 . If Cs is less than or equal to Tg, then the logic flow proceeds to 
step 330 and the frame is declared as erasure since it could not adequately be qualified 
as either a full rate, half rate, or eighth rate frame. If C 8 exceeds the threshold T 8 , then 
10 the logic flow proceeds to step 328 where SEReighth is compared against the relaxed 
threshold SER E T2. If SEReighth exceeds the relaxed threshold SER ET 2, then the. logic 
flow proceeds to step 330 where the consecutive full rate frame counter is reset to 
zero and then to step 332 where the frame is declared as an erasure frame. If SERe igh th 
_ is less than or equal to the relaxed threshold SER ET 2, then the logic flow proceeds to 

45 15 declare the frame rate as eighth starting with step 334 where the value of the 
" J consecutive full rate counter is evaluated. 

\j In this preferred embodiment, if the value of the full rate counter C F is set to a 

'J] value of 1 indicating that only a single full rate frame was received prior to the current 

\4) eighth rate frame, then the logic flow proceeds to step 336 where the vocoder filter 

* 2 0 reset indication is activated. This is due to the determination that the previously 

received frame was probably incorrectly declared to be a full rate frame. If CF is a 
111 value other than 1, then the logic flow skips step 336 and proceeds to step 338 where 

\jtt the consecutive full rate counter CF is reset to zero and the consecutive eighth rate 

Qi counter is incremented. The logic flow continues to step 340 where the frame rate is 

2 5 declared to be eighth rate. 

An alternative embodiment could use a weighted value of SERf u ii and SEReighth 
to make a decision as to whether the full rate frame 222 or eighth rate frame 226 was 
misdetermined. In this case, the parameter WSERf U n and WSEReighth could be 
calculated and compared. For example, WSER fu n could be calculated as WSERf u n = 

3 0 Wfuu * SERf U ti and WSEReighth could be calculated as WSERei ght h = W e i g hth * SEReighth. 

If the value of WSER fu n exceeds the value of WSEReighth, then the decision could be 
made that the misdetermined frame was the full rate frame 222 rather than the eighth 
rate frame 226 and the Reset Filters flag could be set to TRUE. If the value of 
WSERfun is less than or equal to WSEReighth, then the decision could be made that the 
3 5 misdetermined frame was the current eighth rate frame 226 and declare the current 
eighth rate frame as an erasure without setting the Reset_Filters flag. 

A general vocoder algorithm implements a voice production model that 
generally consists of one or more digital filters. One possible model used in speech 
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coders is the code-excited linear prediction model (CELP) in which many algorithms 
known in the art are based. One such vocoder algorithm that is based on the CELP 
model is the EVRC vocoder algorithm. Fig. 5 depicts the voice generation 
components of the EVRC speech decoder, however, it will be recognized by those of 
5 ordinary skill in the art that any suitable speech decoder may utilize the invention. 
The excitation signal sequence is constructed of a fixed excitation 400 and an 
adaptive excitation 412 which create their respective excitation components based, in 
part, on parameters transmitted within the speech frame as well as information from 
earlier decoded frames. The fixed codebook excitation 400 is regenerated by the 
1 0 speech decoder based on a multi-pulse excitation scheme. The pulse information 402 
is converted, by the fixed codebook excitation 400, into a corresponding excitation 
sequence consisting of several pulses at predefined intervals. This sequence is then 
filtered 406 using a single tap finite impulse response (FIR) filter to enhance the pitch 
performance of the excitation sequence. The resulting sequence is then multiplied 410 
15 by a gain factor 408 to create the overall fixed-excitation sequence. The adaptive 
2^ codebook excitation 412 is responsible for generating the pitch component of the 

ill ' , 

\j speech model. This excitation is created by the speech decoder from a history of prior 

Ml combined excitation samples and utilizing the pitch period delay parameter 

|f I transmitted in the speech frame. The resulting sequence is then multiplied 414 by a 

* 2 0 gain parameter 416, which is transmitted as part of the speech frame, to create the 

j"* overall adaptive codebook component of the excitation sequence. The two excitation 

111 components are then added together 418 to create the overall excitation sequence. 

'%* Once the overall excitation sequence is created, it is then filtered using an all-pole 

'Ql filter 1/A(Z) 420 which models the vocal tract of the human speech production 

2 5 system. The resulting synthesized speech sequence is then filtered by a post-filter 

W(Z) 422 which is designed to enhance the perceptual quality of the synthesized 
speech sequence. 

Fig. 5 shows how the filter reset control, received from the enhanced 
determiner 150, can be used to reset the filter states in order to mitigate the audio 

3 0 impact of the misdetermined frame. When the filter reset indication 430 is received 

from the determiner 150, the speech decoder will reset the states of the various filters 
412/420/422. This operation ensures that the effects of the original misdetermination 
are not extended into subsequent frames through erasure processing and filter state 
memories. 

3 5 The adaptive codebook excitation 412 contains a pitch filter that is used to 

generate the pitch component of the synthesized speech sequence. This filter consists 
of a memory of past combined excitation samples that are cleared when the filter reset 
indication 430 is received. The vocal tract filter 420 and the post-filter 422 also 
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contain some filter memory that could extend the audio impact beyond the initial 
misdetermination, so these filters are also reset. Note that it is not necessary to reset 
the fixed codebook pitch enhancement filter since no memory from prior frames is 
utilized. In addition to the filter reset operation, the speech decoder could disregard 
5 the imposed rate transition rules based on the knowledge that the prior full rate frame 
was decoded, by the determiner 150, in error. 

The filter reset control operation has been described in terms of the preferred 
embodiment, however, one alternative embodiment could additionally reset the 
excitation gain parameters 408/416 and allow normal enforcement of the rate 
10 transition rules. By resetting the gain parameters 408/416, the speech decoder could 
mitigate the audio impact of the misdetermination and the rate transition induced 
erasure processing by ensuring that the excitation signal presented to the vocal tract 
filter 420 is effectively nullified. 

Another alternative embodiment could be to initialize the filters 412/420/422 
: j| 15 with states that will produce a more perceptually pleasing transition between the audio 
produced by the misdetermined frame and the expected background signal. One such 
\? filter state initialization could be to reload the filter states to the states that existed 

jJ] prior to the frame misdetermination. 

1=1 1 Fig. 6 illustrates the improvement in audio impact that is realized by the 

^ 2 0 artifact mitigation portion of the invention. Each plot is composed of a timeline 
j*f containing three speech frames. The first plot illustrates the audio impact of a full rate 

HI frame misdetermination when the artifact mitigation scheme is not utilized. The three 

speech frames consist of a frame for the misdetermined frame 500, a frame for the 
\ m [ erasure processing induced by the rate transition rule 502, and a frame for the 

2 5 prolonged effects of the filter state memories 504. 

The second plot illustrates the audio improvement realized by utilizing the 
artifact mitigation scheme according to the preferred embodiment of the invention. 
The first frame 506 shows the effects of a misdetermination that escaped the RDA 
detection phase. The second 508 and third frames 510 show how the effect of the 

3 0 escaped misdetermination is contained by resetting the filter states and allowing the 

speech decoder to disregard the rate transition rule for detected misdeterminations. 
This results in an overall improvement in artifact duration and produces a less 
objectionable audio impact to the human receiver. 

The invention has been described in terms of several preferred embodiments. 
3 5 These preferred embodiments are meant to be illustrative of the invention, and not 
limiting of its broad scope, which is set forth in the following claims. 
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